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PROCESSING OF MULT I -CHANNEL SIGNALS 

BACKGROUND OF THE INVENTION 
Field Of The Invention 

[0001] The present invention relates to the processing of audio 
signals and, more particularly, the coding of multi -channel audio 
5 signals. 

Description Of The Related Art 

[0002] Parametric multi-channel audio coders generally transmit 
only one full -bandwidth audio channel combined with a set of 
10 parameters that describe the spatial properties of an input signal. 
For example, Fig. 1 shows the steps performed in an encoder 10 
described in International Application No. WO2003 / 90208 , filed 
April 22, 2003 (Attorney Docket No. PHNL021156). 

[0003] In an initial step SI, input signals L and R are split 
15 into subbands 101, for example, by time-windowing followed by a 

transform operation. Subsequently, in step S2 , the level difference 
(ILD) of corresponding subband signals is determined; in step S3, 
the time difference (ITD or IPD) of corresponding subband signals 
is determined; and in step S4, the amount of similarity or 
20 dissimilarity of the waveforms which cannot be accounted for by 

ILDs or ITDs, is described. In the subsequent steps S5, S6, and S7 , 
the determined parameters are quantized. 
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[0004] In step S8, a monaural signal S is generated from the 
incoming audio signals, and finally, in step S9, a coded signal 102 
is generated from the monaural signal and the determined spatial 
parameters . 

5 [0005] Fig. 2 shows a schematic block diagram of a coding system 
comprising the encoder 10 and a corresponding decoder 202. The 
coded signal 102, comprising the sum signal S and spatial 
parameters P, is communicated to a decoder 202. The signal 102 may 
be communicated via any suitable communications channel 204. 
10 Alternatively, or additionally, the signal may be stored on a 

removable storage medium 214, which may be transferred from the 
encoder to the decoder . 

[0006] Synthesis (in the decoder 202) is performed by applying 
the spatial parameters to the sum signal to generate left and right 

15 output signals. Hence, the decoder 202 comprises a decoding module 
210 which performs the inverse operation of step S9 and extracts 
the sum signal S and the parameters P from the coded signal 102. 
The decoder further comprises a synthesis module 211 which recovers 
the stereo components L and R from the sum (or dominant) signal and 

20 the spatial parameters. 

[0007] One of the challenges is to generate the monaural signal 
S, step S8, in such a way that, on decoding into the output 
channels, the perceived sound timbre is exactly the same as for the 
input channels. 
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[0008] Several methods of generating this sum signal have been 
suggested previously. In general, these methods compose a mono 
signal as a linear combination of the input signals. Particular 
techniques include : 

5 

1. Simple summation of the input signals. See, for example, 
'Efficient representation of spatial audio using perceptual 
parametrization' , by C. Faller and F. Baumgarte, WASPAA'01, 
Workshop on applications of signal processing on audio and 

10 acoustics, New Paltz, New York, 2001. 

2 . Weighted summation of the input signals using principle 
component analysis (PCA) . See, for example, International Patent 
Application No. WO2003 / 85645 , filed March 20, 2003 (Attorney Docket 

15 No. PHNL020284) and International Patent Application No. 

WO2003/85643 filed March 20, 2003 (Attorney Docket No. PHNL020283). 
In this scheme, the squared weights of the summation sum up to one 
and the actual values depend on the relative energies in the input 
signals . 

20 

3 . Weighted summation with weights depending on the time- 
domain correlation between the input signals. See for example 
'Joint stereo coding of audio signals', by D. Sinha, European 
Patent Application No. EP 1 107 232 A2 . In this method, the weights 
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sum to + 1, while the actual values depend on the cross-correlation 
of the input channels. 

4. U.S. Patent 5,701,346 to Herre et al . discloses weighted 

5 summation with energy-preservation scaling for downmixing left, 
right, and center channels of wideband signals. However, this is 
not performed as a function of frequency. 

[0009] These methods can be applied to the full -bandwidth signal 

10 or can be applied on band-filtered signals which all have their own 
weights for each frequency band. However, all of the methods 
described have one drawback. If the cross-correlation is frequency- 
dependent, which is very often the case for stereo recordings, 
coloration (i.e., a change of the perceived timbre) of the sound of 

15 the decoder occurs. 

[0010] This can be explained as follows: For a frequency band 
that has a cross-correlation of +1, linear summation of two input 
signals results in a linear addition of the signal amplitudes and 
squaring the additive signal to determine the resultant energy. 

20 (For two in-phase signals of equal amplitude, this results in a 

doubling of amplitude with a quadrupling of energy.) If the cross- 
correlation is 0, linear summation results in less than a doubling 
of the amplitude and a quadrupling of the energy. Furthermore, if 
the cross-correlation for a certain frequency band amounts -1, the 

2 5 signal components of that frequency band cancel out and no signal 
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remains. Hence, for simple summation, the frequency bands of the 
sum signal can have an energy (power) between 0 and four times the 
power of the two input signals, depending on the relative levels 
and the cross-correlation of the input signals. 

5 

SUMMARY OF THW INVENTION 
[0011] The present invention attempts to mitigate this problem 
and provides a method of generating a monaural signal (S) 
comprising a combination of at least two input audio channels (L, 

10 R) , comprising the steps of: 

[0012] for each of a plurality of sequential segments (t(n)) of 
said audio channels (L,R), summing (46) corresponding frequency 
components from respective frequency spectrum representations for 
each audio channel (L(k), R(k)) to provide a set of summed 

15 frequency components (S(k)) for each sequential segment; 

[0013] for each of said plurality of sequential segments, 
calculating (45) a correction factor (m(i)) for each of a plurality 
of frequency bands (i) as function of the energy of the frequency 
components of the summed signal in said band {^\S(k)\ 2 ) and the 

2 0 energy of said frequency components of the input audio channels in 
said band ( £ j L(k) | 2 + 1 R(k) | 2 } ) ; and 

kei 

[0014] correcting (47) each summed frequency component as a 
function of the correction factor (m(i)) for the frequency band of 
said component.. 
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[0015] If different frequency bands tended to, on average, have 
the same correlation, then one might expect that over time, 
distortion caused by such summation would average out over the 
frequency spectrum. However, it has been recognized that, in multi- 
5 channel signals, low frequency components tend to be more 

correlated than high frequency components. Therefore, it will be 
seen that without the present invention, summation, which does not 
take into account frequency dependent correlation of channels, 
would tend to unduly boost the energy levels of more highly 
10 correlated and, in particular, psycho-acoustically sensitive low 
frequency bands . 

[0016] The present invention provides a frequency-dependent 
correction of the mono signal where the correction factor depends 
on a frequency-dependent cross-correlation and relative levels of 

15 the input signals. This method reduces spectral coloration 

artefacts which are introduced by known summation methods and 
ensures energy preservation in each frequency band. 
[0017] The frequency-dependent correction can be applied by 
first summing the input signals (either summed linear or weighted) 

20 followed by applying a correction filter, or by releasing the 

constraint that the weights for summation (or their squared values) 
necessarily sum up to +1 but sum to a value that depends on the 
cross -correlation . 

[0018] It should be noted that the invention can be applied to 
2 5 any system where two or more two input channels are combined. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0019] Embodiments of the invention will now be described with 
reference to the accompanying drawings, in which: 
[0020] Figure 1 shows a prior art encoder; 
5 [0021] Figure 2 shows a block diagram of an audio system 
including the encoder of Figure 1; 

[0022] Figure 3 shows the steps performed by a signal summation 
component of an audio coder according to a first embodiment of the 
invent i on ; and 

10 [0023] Figure 4 shows linear interpolation of the correction 
factors m(i) applied by the summation component of Figure 3. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[0024] According to the present invention, there is provided an 

15 improved signal summation component (S8'), in particular, for 

performing the step corresponding to S8 of Figure 1. Nonetheless, 
it will be seen that the invention is applicable anywhere two or 
more signals need to be summed. In a first embodiment of the 
invention, the summation component adds left and right stereo 

2 0 channel signals prior to the summed signal S being encoded, step 
S9. 

[0025] Referring now to Figure 3, in the first embodiment, the 
left (L) and right (R) channel signals provided to the summation 
component comprise multi-channel segments ml, m2... overlapping in 
25 successive time frames t(n-l), t(n), t (n+1) . Typically sinusoids, 
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are updated at a rate of 10ms and each segment ml, m2... is twice the 
length of the update rate, i.e., 2 0ms. 

[0026] For each overlapping time window t (n-1 ) , t (n) , t (n+1 ) for 
which the L,R channel signals are to be summed, the summation 
5 component uses a ( square-root) Hanning window function to combine 
each channel signal from overlapping segments ml, m2... into a 
respective time-domain signal representing each channel for a time 
window, step 42 . 

[0027] An FFT (Fast Fourier Transform) is applied on each time- 
10 domain windowed signal, resulting in a respective complex frequency 
spectrum representation of the windowed signal for each channel, 
step 44. For a sampling rate of 44.1kHz and a frame length of 20ms, 
the length of the FFT is typically 882. This process results in a 
set of K frequency components for both input channels (L(k), R(k)). 
15 [0028] In the first embodiment, the two input channels 

representations L(k) and R(k) are first combined by a simple linear 
summation, step 46. It will be seen, however, that this could 
easily be extended to a weighted summation. Thus, for the present 
embodiment, sum signal S(k) comprises: 

20 

S(k) = L(k) + R(k) 

Separately, the frequency components of the input signals L (k) and 
R(k) are grouped into several frequency bands, preferably using 
25 perceptually-related bandwidths (ERB or BARK scale) and, for each 
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subband i , an energy-preserving correction factor m ( i ) is computed, 
step 45: 



X jl(*)| 2 +|tf(*)| 2 } X \L(k)\ 2 +\R(k)\ 



m 2 (i) = ^ = = T~ Equation 1 

2^ \S(k)\ 2 2^ \L(k) + R(k)\ 2 

kei kei 

which can also be written as: 



X jl(*)| 2 +|*(*)| 2 } 

kei 

2 El L(k) | 2 + El R(k) | 2 +2p LR (i)0L(k)\ 2 Y J \R(k)\ 

kei kei \ kei kei 



m 2 {i) =— — Equation 2 



10 with PlrU) being the (normalized) cross-correlation of the 

waveforms of subband i, a parameter used elsewhere in parametric 
multi-channel coders and so readily available for the calculations 
of Equation 2. In any case, step 45 provides a correction factor 
m(i) for each subband i. 

15 [0029] The next step 47 then comprises multiplying the each 

frequency component S(k) of the sum signal with a correction filter 
C(k) : 



S'(k) = S(k)C(k) = C(k)L(k) + C(k)R(k) Equation 3 

20 

[0030] It will be seen from the last component of Equation 3 
that the correction filter can be applied to either the summed 
signal (S(k) alone or each input channel (L(k),R(k)). As such, 
steps 46 and 47 can be combined when the correction factor m(i) is 
25 known or performed separately with the summed signal S(k) being 
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used in the determination of m(i) , as indicated by the hashed line 
in Figure 3 . 

[0031] In the preferred embodiments, the correction factors m(i) 
are used for the center frequencies of each subband, while for 
5 other frequencies , the correction factors m ( i ) are interpolated to 
provide the correction filter C(k) for each frequency component (k) 
of a subband i. In principle, any interpolation function can be 
used, however, empirical results have shown that a simple linear 
interpolation scheme suffices, Figure 4. 

10 [0032] Alternatively, an individual correction factor could be 
derived for each FFT bin (i.e., subband i corresponds to frequency 
component k) , in which case no interpolation is necessary. This 
method, however, may result in a jagged rather than a smooth 
frequency behavior of the correction factors which is often 

15 undesired due to resulting time-domain distortions. 

[0033] In the preferred embodiments, the summation component 
then takes an inverse FFT of the corrected summed signal S' (k) to 
obtain a time domain signal, step 48. By applying overlap-add for 
successive corrected summed time domain signals, step 50, the final 

20 summed signal si, s2... is created and this is fed through to be 
encoded, step S9, Figure 1. It will be seen that the summed 
segments si, s2... correspond to the segments ml, m2... in the time 
domain and as such, no loss of synchronization occurs as a result 
of the summation. 
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[0034] It will be seen that where the input channel signals are 
not overlapping signals but rather continuous time signals, then 
the windowing step 42 will not be required. Similarly, if the 
encoding step S9 expects a continuous time signal rather than an 
5 overlapping signal, the overlap-add step 50 will not be required. 
Furthermore, it will be seen that the described method of 
segmentation and frequency-domain transformation can also be 
replaced by other (possibly continuous-time) f ilterbank-like 
structures. Here, the input audio signals are fed to a respective 
10 set of filters, which collectively provide an instantaneous 

frequency spectrum representation for each input audio signal. This 
means that sequential segments can, in fact, correspond with single 
time samples rather than blocks of samples as in the described 
embodiments . 

15 [0035] It will be seen from Equation 1 that there are 

circumstances where particular frequency components for the left 
and right channels may cancel out one another or, if they have a 
negative correlation, they may tend to produce very large 
correction factor values m^(i) for a particular band. In such 

20 cases, a sign bit could be transmitted to indicate that the sum 
signal for the component S(k) is: 

S(k) = L(k)-R(k) 

25 with a corresponding subtraction used in equations 1 or 2 . 
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[0036] Alternatively, the components for a frequency band i 
might be rotated more into phase with one another by an angle a(i) . 
The ITD analysis process S3 provides the (average) phase difference 
between (subbands of the) input signals L(k) and R(k). Assuming 
5 that for a certain frequency band i, the phase difference between 
the input signals is given by a(i), the input signals L(k) and R(k) 
can be transformed to two new input signals 1/ (k) and R' (k) prior 
to summation according to the following: 

1Q L\k) = e M L(k) 

R\k) = e 7(1 c)a{i) R{k) 

with c being a parameter which determines the distribution of phase 

alignment between the two input channels (0 • c • 1) . 

[0037] In any case, it will be seen that where, for example, two 

15 channels have a correlation of +1 for a sub-band i, then m 2 (i) will 
be % and so m(i) will be Thus, the correction factor C(k) for 
any component in the band i will tend to preserve the original 
energy level by tending to take half of each original input signal 
for the summed signal. However, as can be seen from Equation 1, 

20 where a frequency band i of a stereo signal includes spatial 

properties, the energy of the signal S(k) will tend to get smaller 
than if they were in phase, while the sum of the energies of the L, 
R signals will tend to stay large and so the correction factor will 
tend to be larger for those signals. As such, overall energy levels 
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in the sum signal will still be preserved across the spectrum, in 
spite of frequency-dependent correlation in the input signals. 
[0038] In a second embodiment, the extension towards multiple 
(more than two) input channels is shown, combined with possible 
weighting of the input channels mentioned above. The frequency- 
domain input channels are denoted by X n (k), for the k-th frequency 
component of the n-th input channel. The frequency components k of 
these input channels are grouped in frequency bands i. 
Subsequently, a correction factor m(i) is computed for subband i as 
follows : 

X XI w„ | 2 



k& 



2>. (*)*.(*) 



[0039] In this equation, w n (k) denote frequency-dependent 
weighting factors of the input channels n (which can simply be set 
to +1 for linear summation) . From these correction factors m(i) , a 
correction filter C(k) is generated by interpolation of the 
correction factors m(i) as described in the first embodiment. Then 
the mono output channel S(k) is obtained according to: 



S(*) = C(*)5>. (*)*.(*) 

n 

[0040] It will be seen that using the above equations, the 
weights of the different channels do not necessarily sum to +1, 
however, the correction filter automatically corrects for weights 
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that do not sum to +1 and ensures (interpolated) energy 
preservation in each frequency band. 
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